3 0 3 Reg. No.

## NIPAL INSTITUTE OF TECHNOLOGY (A constituent unit of MAHE, Manipal)

## I SEMESTER M.TECH. (COMPUTER SCIENCE AND ENGINEERING) **END SEMESTER EXAMINATIONS, DECEMBER - 2023** HIGH PERFORMANCE COMPUTING SYSTEMS [CSE 5116]

Date: 07/12/2023 [9:30 AM - 12:30 PM]

MAX. MARKS: 50 Time: 3 Hours Instructions: Answer ALL the questions. Explain the use of reduction clause in OpenMP. Write an OpenMP program that 1A. 5M reads an array. Further apply the use of this operator in the program to calculate the sum of all the elements in the array using parallelization. Sketch a relevant block diagram to explain the concept of shared memory in **3M** 1B. symmetric multiprocessor systems. Define the node degree and the network diameter metrics of a static interconnection 1C. 2M network of SIMD computers. 2A: Implement an MPI program to meet the following: The master process initializes an array and then distributes an equal portion of that array to the other processes. After the other processes receive their portion of the array, they perform an addition operation to each array element. They also maintain a sum for their portion of the array. The master process does likewise with its portion of the array. An appropriate MPI call is used to collect the sums maintained by each process. Finally, the master process displays global sum of array elements. Assume 5M that the number of elements is evenly divisible by number of processes. Interpret and explain different built-in MPI reduction operators. Write the appropriate 2B. **3M** API in using any one of the MPI reduction operators. Construct a skeleton part of the MPI program to show the possibility of occurrence of 2C. deadlock when using point to point communication routines. 2M Organize and list the steps in implementation of OpenCL program with the 3A. 5M supporting APIs. Just name the APIs in every step. Construct an OpenCL kernel for multiplying two matrices A and B of size  $M \times N$  and N X P. Each row of A has to be multiplied by a separate work item. 5M **4A.** Implement a parallel algorithm to sum *n* values using Shuffle-exchange SIMD model where n is the number of values to be added and p is the number of processors in the model. It is assumed that each processor initially holds one of the values among n. 4M For question 4A, assume Shuffle-exchange SIMD model with n = p = 8. Demonstrate to show how such addition happens using appropriate diagrams with **3M** suitable example. Discuss message passing and shared-memory models in the context of parallel

Page 1 | 2 CSE 5116

programming models.

**3M** 

| 5A. | Design 8 x 8 baseline multistage network. Explain your design methodology.                                                      |
|-----|---------------------------------------------------------------------------------------------------------------------------------|
| 5B. | Implement a kernel in CUDA to multiply two Matrices A and B of dimensions $M \times N$ and $N \times P$ resulting in Matrix $C$ |
|     | threads, and each column of the resultant matrix is to be somewhat $P$ . Create $P$ number of                                   |

4M

4M 2M